Regression is very well designed for understanding how a continuous variable predicts another continuous variable.
We have to employ special procedures if we are to understand how a categorical variable predicts another continuous variable.
What is the association of city wide diversity and region of the country with neighborhood level diversity?
Data Source: New York Times 538 Blog
The orange line below is a theory driven line..
The red line below is a regression line.
| Color | Region |
|---|---|
| green | West |
| red | SouthEast |
| blue | NorthEast |
| purple | Midwest |
Essentially, we want to create a set of yes/no indicator variables, for each value of the categorical variable. SPSS has a function to automatically do this.
creating indicator variables
\[NeighborhoodDiversity = \beta_0 + \beta CitywideDiversity + \beta NorthEast + \beta SouthEast + \beta West + e_i\]
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -0.01911 | 0.02766 | -0.6908 | 0.4914 |
| CITYWIDE_DIVERSITY_INDEX | 0.7457 | 0.04407 | 16.92 | 2e-30 |
| REGIONnortheast | 0.003935 | 0.01957 | 0.2011 | 0.8411 |
| REGIONsoutheast | 0.02858 | 0.01621 | 1.763 | 0.08115 |
| REGIONwest | 0.08945 | 0.01595 | 5.606 | 2.019e-07 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 100 | 0.05206 | 0.7915 | 0.7827 |